Week 3: Approximate Near Neighbor Search
نویسنده
چکیده
Last week we discussed the randomized O(n) expected time algorithm to compute the closest pair of points in the plane. This week we continue with a related data structure problem: (approximate) near neighbor search. Suppose you have a database D of n entries: images, text documents, census data, etc. Using a Dictionary data structure, for example a B-tree or a hash table, you can check if an entry x is in D. But sometimes you actually want to find an entry y in the database which is as “close” as possible to x, and not necessarily exactly the same. This is one of the most basic ways to do classification, for example: you keep a database of labeled images, and given a new image you want to see if it is close to any image in the database; if your new image is close to an image of a dog, then maybe it is also an image of a dog. Or, more simply, databases can contain errors, and looking for an approximate match is a way to make our database search more robust.
منابع مشابه
Graph-based time-space trade-offs for approximate near neighbors
We take a first step towards a rigorous asymptotic analysis of graph-based approaches for finding (approximate) nearest neighbors in high-dimensional spaces, by analyzing the complexity of (randomized) greedy walks on the approximate near neighbor graph. For random data sets of size n = 2o(d) on the d-dimensional Euclidean unit sphere, using near neighbor graphs we can provably solve the approx...
متن کاملCoding for Random Projections and Approximate Near Neighbor Search
Abstract This technical note compares two coding (quantization) schemes for random projections in the context of sub-linear time approximate near neighbor search. The first scheme is based on uniform quantization [4] while the second scheme utilizes a uniform quantization plus a uniformly random offset [1] (which has been popular in practice). The prior work [4] compared the two schemes in the ...
متن کاملA Replacement for Voronoi Diagrams of Near Linear Size
A compressed quad tree based replacement for approximate voronoi diagrams with near linear complexity using hierarchial clustering and prioritized point location among balls and with applications for improved approximate nearest neighbour search using point location among equal balls, fat triangulations of proximity diagrams in two and higher dimensions and for fast approximate proximity search.
متن کاملLearning Vocabulary-Based Hashing with AdaBoost
Approximate near neighbor search plays a critical role in various kinds of multimedia applications. The vocabulary-based hashing scheme uses vocabularies, i.e. selected sets of feature points, to define a hash function family. The function family can be employed to build an approximate near neighbor search index. The critical problem in vocabulary-based hashing is the criteria of choosing vocab...
متن کاملSIMP: Accurate and Efficient Near Neighbor Search in Very High Dimensional Spaces
Near neighbor search in very high dimensional spaces is useful in many applications. Existing techniques solve this problem efficiently only for the approximate case. These solutions are designed to solve r-near neighbor queries only for a fixed query range or a set of query ranges with probabilistic guarantees and then, extended for nearest neighbor queries. Solutions supporting a set of query...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018